The required datasets were procured from The World Bank Data website.
For this project, the following datasets have been accessed from the source for analyses -
Each of the above datasets consist of approximately 10000 observations.
The given datasets were read using pandas software library and the merged into one to form a complete dataset for analyses.
First, a list of criteria was created to drop the irrelvant data points. Then a function was created to clean the datasets, which can be seen below.
# Creating a list of criteria
not_needed=['East Asia & Pacific','Europe & Central Asia','Latin America & Caribbean','Middle East & North Africa','North America','South Asia','Sub-Saharan Africa', 'High income','Low & middle income','Low income','Lower middle income','Middle income','Upper middle income', 'World','Arab World','Central Europe and the Baltics','Caribbean small states','East Asia & Pacific (excluding high income)',
'Early-demographic dividend', 'Europe & Central Asia (excluding high income)', 'Euro area','European Union','Fragile and conflict affected situations', 'Heavily indebted poor countries (HIPC)','IBRD only', 'IDA & IBRD total', 'IDA total','IDA blend','IDA only',
'Latin America & Caribbean (excluding high income)', 'Least developed countries: UN classification','Late-demographic dividend','Middle East & North Africa (excluding high income)', 'OECD members','Other small states', 'Pacific island small states', 'Pre-demographic dividend','Post-demographic dividend','Sub-Saharan Africa (excluding high income)','Small states','East Asia & Pacific (IDA & IBRD countries)',
'Europe & Central Asia (IDA & IBRD countries)','Latin America & the Caribbean (IDA & IBRD countries)',
'Middle East & North Africa (IDA & IBRD countries)','South Asia (IDA & IBRD)',
'Sub-Saharan Africa (IDA & IBRD countries)']
# Filtering the data using the given function as per analyses requirements.
import pandas as pd
def display_relevant(dataset):
data = pd.read_csv(dataset)
data = data[['Country Name', 'Country Code', '2015', '2016', '2017', '2018', '2019']]
data = data.dropna()
dirty_data = data[data['Country Name'].isin(not_needed)]
dirty_data_indexed = dirty_data.index
clean_data=data.drop(dirty_data_indexed)
return clean_data
Unemployment = display_relevant("Unemployment.csv")
Inflation = display_relevant("Inflation.csv")
Population = display_relevant("Population.csv")
GDP_Capita = display_relevant("GDP_Capita.csv")
Labor_Force = display_relevant("Labor Force.csv")
For analyses, the datasets were converted into a long format and merged together using the given function.
# Reshaping the data using the given function for analyses.
def display_long(df):
df_long = pd.melt(df,id_vars=['Country Name', 'Country Code'],var_name='Year', value_name='Value').set_index(['Country Name','Country Code'])
df_long = df_long.reset_index()
return df_long
Unemployment_long = display_long(Unemployment)
Inflation_long = display_long(Inflation)
Population_long = display_long(Population)
GDP_Capita_long = display_long(GDP_Capita)
Labor_Force_long = display_long(Labor_Force)
The 'Value' columns were renamed respectively as follows.
# Renaming the 'Value' columns respectively as follows
Unemployment_long.rename(columns={'Value':'Unemployment %'}, inplace=True)
Inflation_long.rename(columns={'Value':'Inflation %'}, inplace=True)
Population_long.rename(columns={'Value':'Population'}, inplace=True)
GDP_Capita_long.rename(columns={'Value':'GDP Per Capita'}, inplace=True)
Labor_Force_long.rename(columns={'Value':'Labor Force'}, inplace=True)
First the long datasets were stored together into a list and then merged together to form the Complete dataset.
# Storing all the dataframes into a list
dfs = [Unemployment_long, Inflation_long, Population_long, GDP_Capita_long, Labor_Force_long]
from functools import reduce
# Merging the datasets together to form one dataset
Complete = reduce(lambda left, right: pd.merge(left, right, on=['Country Name', 'Country Code', 'Year']), dfs)
Complete
A new dataset, consisting of Country Names and Codes was downloaded and merged with the Complete dataset for graphing purposes. The new dataset, data.csv was downloaded from datahub.io through the following -
# Importing country codes from the above dataset
Country_Code=pd.read_csv('data.csv')
Country_Code=Country_Code[['Continent_Name', 'Three_Letter_Country_Code']]
Country_Code.head()
The final dataset was named data which consists of all the required information needed for visualization purposes.
# Merging the country codes to the Complete dataset
data=pd.merge(Complete, Country_Code, left_on= 'Country Code', right_on= 'Three_Letter_Country_Code', how= 'left')
data = data.drop(labels='Three_Letter_Country_Code', axis=1)
data.head()
# Calculating change in unemployment rates over the given time period for analysis use
data['Change in Unemployment Rates'] = data.groupby('Country Name')['Unemployment %'].pct_change(periods=4)
data
# Importing the libraries needed for visualization
import numpy as np
import matplotlib.pyplot as plt
import plotly
import chart_studio.plotly as py
import plotly.graph_objs as go
from plotly.subplots import make_subplots
import plotly.tools as tls
import plotly.express as px
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
### import plotly.offline as py
### py.init_notebook_mode(connected=True)
py.sign_in('arushik1994', 'jD7AopX1C1xMEwC6gEBH')
# Displaying a world map depicting the change in unemployment rates over the given time period
# Adding title, dimensions, projection and animation requirements for a better visualization
fig = px.choropleth(data, title="Unemployment Rates Across The World", locations="Country Code", color="Unemployment %",
hover_name="Country Name", animation_frame="Year", range_color=[0, 30],
width=900, height=700, projection='natural earth')
fig.show()
As we play the above visual, we observe a significant increase in unemployment in South Africa, Turkey, Brazil, Sudan and Namibia in the past five years.
We also observe a significant decrease in unemployment in Spain and Greece over the given years.
# Displaying a world map depicting the change in unemployment rates over the given time period
# Adding title, dimensions, projection and animation requirements for a better visualization
fig = px.choropleth(data, title="Inflation Rates Across The World", locations="Country Code",
color="Inflation %", hover_name="Country Name", animation_frame="Year",
range_color=[-5, 30], width=900, height=700, projection='natural earth')
fig.show()
As we play the above visual, we observe the following -
# Displaying an animated scatterplot depicting the relationship between labor force and unemployment rate over the given time period
fig = px.scatter(data, x="Labor Force", y="Unemployment %", animation_frame="Year", animation_group="Country Name",
size="Population", color="Continent_Name", hover_name="Country Name", facet_col="Continent_Name", facet_col_spacing=0.03,
log_x=True, size_max=45)
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
fig.show()
# Displaying an animated scatterplot depicting the relationship between unemployment rate and inflation rate over the given time period
fig = px.scatter(data, x="Inflation %", y="Unemployment %", animation_frame="Year", animation_group="Country Name",
size="Population", color="Continent_Name", hover_name="Country Name", facet_col="Continent_Name", facet_col_spacing=0.03, size_max=45, log_x=True)
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
fig.layout.xaxis.automargin: True
fig.show()
# Displaying an animated scatterplot depicting the relationship between GDP per capita and unemployment rate over the given time period
fig = px.scatter(data, title = "Relationship between Unemployment Rates and GDP Per Capita", x= "GDP Per Capita", y= "Unemployment %", animation_frame= "Year",
animation_group= "Country Name", size= "Population", color = "Continent_Name",
hover_name= "Country Name", log_x=True, size_max=100)
fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 700
fig.show()
# Displaying an animated scatterplot depicting the relationship between GDP per capita and inflation rate over the given time period
fig = px.scatter(data, title = "Relationship between GDP Per Capita and Inflation Rates", x= "GDP Per Capita", y= "Inflation %", animation_frame= "Year",
animation_group= "Country Name", size= "Population", color = "Continent_Name",
hover_name= "Country Name", log_x=True, size_max=100)
fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 700
fig.show()
# Querying uneployment rates by year, sorting them in an ascending order and storing the same in a list
top15_unemp_2015 = data.query('Year == "2015"').sort_values(by = 'Unemployment %', ascending = True)[:15]
top15_unemp_2016 = data.query('Year == "2016"').sort_values(by = 'Unemployment %', ascending = True)[:15]
top15_unemp_2017 = data.query('Year == "2017"').sort_values(by = 'Unemployment %', ascending = True)[:15]
top15_unemp_2018 = data.query('Year == "2018"').sort_values(by = 'Unemployment %', ascending = True)[:15]
top15_unemp_2019 = data.query('Year == "2019"').sort_values(by = 'Unemployment %', ascending = True)[:15]
# Graphing subplots to show the top 15 countries with lowest unemployment rates for each year
# Making and positioning subplots corresponding to each year using the above created list
fig = make_subplots(rows=3, cols=2, vertical_spacing = 0.30, column_widths=[0.5, 0.5])
fig.add_trace(go.Bar(x=top15_unemp_2015['Country Name'], y=top15_unemp_2015['Unemployment %'], name='2015'), row=1, col=1)
fig.add_trace(go.Bar(x=top15_unemp_2016['Country Name'], y=top15_unemp_2016['Unemployment %'], name='2016'), row=1, col=2)
fig.add_trace(go.Bar(x=top15_unemp_2017['Country Name'], y=top15_unemp_2017['Unemployment %'], name='2017'), row=2, col=1)
fig.add_trace(go.Bar(x=top15_unemp_2018['Country Name'], y=top15_unemp_2018['Unemployment %'], name='2018'), row=2, col=2)
fig.add_trace(go.Bar(x=top15_unemp_2019['Country Name'], y=top15_unemp_2019['Unemployment %'], name='2019'), row=3, col=1)
# Adding dimensions and plot title
fig.update_layout(height = 800, width=800, title_text= 'Countries with Lowest Unemployment Rates: Top 15')
# Displaying the combined plot
fig.show()
# Querying inflation rates by year, sorting them in an ascending order and storing the same in a list
top15_inf_2015 = data.query('Year == "2015"').sort_values(by = 'Inflation %', ascending = True)[:15]
top15_inf_2016 = data.query('Year == "2016"').sort_values(by = 'Inflation %', ascending = True)[:15]
top15_inf_2017 = data.query('Year == "2017"').sort_values(by = 'Inflation %', ascending = True)[:15]
top15_inf_2018 = data.query('Year == "2018"').sort_values(by = 'Inflation %', ascending = True)[:15]
top15_inf_2019 = data.query('Year == "2019"').sort_values(by = 'Inflation %', ascending = True)[:15]
# Graphing subplots to show the top 15 countries with lowest unemployment rates for each year
# Making and positioning subplots corresponding to each year using the above created lists
fig = make_subplots(rows=3, cols=2, vertical_spacing = 0.30, column_widths=[0.5, 0.5])
fig.add_trace(go.Bar(x=top15_inf_2015['Country Name'], y=top15_inf_2015['Inflation %'], name='2015'), row=1, col=1)
fig.add_trace(go.Bar(x=top15_inf_2016['Country Name'], y=top15_inf_2016['Inflation %'], name='2016'), row=1, col=2)
fig.add_trace(go.Bar(x=top15_inf_2017['Country Name'], y=top15_inf_2017['Inflation %'], name='2017'), row=2, col=1)
fig.add_trace(go.Bar(x=top15_inf_2018['Country Name'], y=top15_inf_2018['Inflation %'], name='2018'), row=2, col=2)
fig.add_trace(go.Bar(x=top15_inf_2019['Country Name'], y=top15_inf_2019['Inflation %'], name='2019'), row=3, col=1)
# Adding dimensions and plot title
fig.update_layout(height = 800, width = 800, title_text= 'Countries with Lowest Inflation Rates: Top 15')
# Displaying the combined plot
fig.show()
# Sorting the change in unemployment rates in an ascending order
Change_Unemployment = data.sort_values(by = 'Change in Unemployment Rates', ascending = True)[:15]
# Displaying a bar graph depicting the top 15 countries with the maximum change in unemployment rates over the given time period
fig = px.bar(Change_Unemployment, x="Change in Unemployment Rates", y="Country Name", orientation='h', color = 'Continent_Name', title='Change in Unemployment Rates: Top 15')
fig.show()
In this section, a linear regression model has been used to predict Qatar's unemployment rates for the next five years, 2020 to 2024.
# Importing linear model for analysis
from sklearn import linear_model
# Slicing rows relevant for Qatar's unemployment rate predictions
Qatar = data[data['Country Name'] == 'Qatar']
Qatar
First, the Year column has been converted to an integer type. X and y columns have been reshaped for modelling purposes.
# Changing the datatype of the Year variable
# Storing the Year and Unemployment Rates in X and y respectively and reshaping for modelling purposes
Qatar_new= Qatar[['Year', 'Unemployment %']]
Qatar_new['Year'] = Qatar_new['Year'].astype('int')
X = Qatar_new.iloc[:, 0].values.reshape(-1, 1)
y = Qatar_new.iloc[:, 1].values.reshape(-1, 1)
As shown below, a model was fitted and the predicted values have been stored in the "predicted" variable.
# Applying the model
regr = linear_model.LinearRegression()
# Fitting the model
regr.fit(X,y)
# Obtaining the predicted values
predicted = regr.predict(X)
A plot is drafted to observe the actual and predicted values of the nation's unemployment rates for the given years.
# Creating a plot to observe the actual and predicted values for the given time period
# Setting plot size
fig = plt.figure(figsize=(14,6))
# Setting plot type
plt.scatter(X, y)
# Assigning the color red to the trendline
plt.plot(X, predicted, color = 'red')
# Assigning tick range on the x-axis
plt.xticks(np.arange(2015, 2020, 1))
# Assigning a label to the x-axis
plt.xlabel("Years")
# Assigning a label to the y-axis
plt.ylabel("Unemployment Rates")
# Assigning a plot title
plt.title("Qatar's Unemployment Rates: 2015-2019")
A score is calculated to judge the reliability of the model.
# Calculating the score to judge the model's reliability
regr.score(X, y)
Now, predictions are made for the next five years as shown below and a plot is drafted.
# Creating a list of the next five years
years=[2020, 2021, 2022, 2023, 2024]
# Storing the list into an array and reshaping the same
future=np.array(years).reshape(-1, 1)
# Predicting the future unemployment rates
future_unemployment=regr.predict(future)
# Creating a plot to observe the actual and predicted values for the given time period
fig=plt.figure(figsize=(14,6))
plt.scatter(X, y, label='Actual')
plt.plot(X, y, color = 'Red')
# Creating a plot to observe the predicted values for the next five years
plt.scatter(years, future_unemployment, marker='o', label='Forecasted')
plt.xticks(np.arange(2015, 2025, 1))
plt.xlabel("Years")
plt.ylabel("Unemployment Rates")
plt.title("Predicting Qatar's Unemployment Rates for the next 5 years: 2020-2024")
plt.legend()
We follow the above steps to predict the nation's inflation rates for the next five years.
# Changing the datatype of the Year variable
# Storing the Year and Unemployment Rates in X and y respectively and reshaping for modelling purposes
Qatar_new= Qatar[['Year', 'Inflation %']]
Qatar_new['Year'] = Qatar_new['Year'].astype('int')
X = Qatar_new.iloc[:, 0].values.reshape(-1, 1)
y = Qatar_new.iloc[:, 1].values.reshape(-1, 1)
# Applying the model
regr_inf = linear_model.LinearRegression()
# Fitting the model
regr_inf.fit(X,y)
# Obtaining the predicted values
predicted_inf = regr_inf.predict(X)
# Creating a plot to observe the actual and predicted values for the given time period
fig = plt.figure(figsize=(14,6))
plt.scatter(X, y)
plt.plot(X, predicted_inf, color = 'red')
plt.xticks(np.arange(2015, 2020, 1))
plt.xlabel("Years")
plt.ylabel("Infltion Rates")
plt.title("Qatar's Inflation Rates: 2015-2019")
# Calculating the score to judge the model's reliability
regr_inf.score(X, y)
# Creating a list of the next five years
years=[2020, 2021, 2022, 2023, 2024]
# Storing the list into an array and reshaping the same
future=np.array(years).reshape(-1, 1)
# Predicting the future unemployment rates
future_inflationt=regr_inf.predict(future)
# Creating a plot to observe the actual and predicted values for the given time period
fig=plt.figure(figsize=(14,6))
plt.scatter(X, y, label='Actual')
plt.plot(X, y, color = 'Red')
# Creating a plot to observe the predicted values for the next five years
plt.scatter(years, future_inflationt, marker='o', label='Forecasted')
plt.xticks(np.arange(2015, 2025, 1))
plt.xlabel("Years")
plt.ylabel("Inflation Rates")
plt.title("Predicting Qatar's Inflation Rates for the next 5 years: 2020-2024")
plt.legend()
To conclude, it can be stated that we can expect both Qatar's unemployment and inflation rates to steadily decline in the future. It is important to note that Qatar's low unemployment rate can be attributed to the following -
On the other hand, Qatar's low inflation rate can be attributed to the following -
From this analysis, we can conclude that Qatar is a suitable example where low unemployment and deflation co-exist. Research has revealed that a similar trend can be observed in other countries in the Middle East such as Saudi Arabia and United Arab Emirates. Additionally, research has revealed that the coronavirus pandemic has further caused a slump in consumer prices in the Middle East. The average consumer in the region continues to cut expenses and remain frugal.
However, for most countries across continents, we observe that over the given period of time, unemployment rates decreased with an increase in inflation rates.
The learning process has been both interesting and challenging. The lesson that I have learned is the importance of better visualizations. Through this project, I have become well-versed with the concept of creating effective visualizations using plotly.